Introduction to Document Analysis and Recognition

نویسنده

  • Simone Marinai
چکیده

Document Analysis and Recognition (DAR) aims at the automatic extraction of information presented on paper and initially addressed to human comprehension. The desired output of DAR systems is usually in a suitable symbolic representation that can subsequently be processed by computers. Over the centuries, paper documents have been the principal instrument to make permanent the progress of the humankind. Nowadays, most information is still recorded, stored, and distributed in paper format. The widespread use of computers for document editing, with the introduction of PCs and wordprocessors in the late 1980’s, had the effect of increasing, instead of reducing, the amount of information held on paper. Even if current technological trends seem to move towards a paperless world, some studies demonstrated that the use of paper as a media for information exchange is still increasing [1]. Moreover, there are still application domains where the paper persists to be the preferred media [2]. The most widely known applications of DAR are related to the processing of office documents (such as invoices, bank documents, business letters, and checks) and to the automatic mail sorting. With the current availability of inexpensive high-resolution scanning devices, combined with powerful computers, state-of-the-art OCR packages can solve simple recognition tasks for most users. Recent research directions are widening the use of the DAR techniques, significant examples are the processing of ancient/historical documents in digital libraries, the information extraction from “digital born” documents, such as PDF and HTML, and the analysis of natural images (acquired with mobile phones and digital cameras) containing textual information. The development of a DAR system requires the integration of several competences in computer science, among the others: image processing, pattern recognition, natural language processing, artificial intelligence, and database systems. DAR applications are particularly suitable for the incorporation of

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Methodology for Validation of Issuance of Mystical and Ethical Narrations (A Case Study and Discourse Analysis on the Methodology of the Book Sirr ul-asra’)

The Book “the Secret of Prophet Mohammad’s Midnight Journey to the Seven Heavens in Explanation of Al-Mi’raj Hadith” is written by Ayatollah Sa’adatparvar. Analyzing the discourse of a part of its introduction, his recognition method about this hadith has been investigated in this paper. The paper aims at investigating the particular discourse pattern of the author in analyzing the document of ...

متن کامل

Off-line Arabic Handwritten Recognition Using a Novel Hybrid HMM-DNN Model

In order to facilitate the entry of data into the computer and its digitalization, automatic recognition of printed texts and manuscripts is one of the considerable aid to many applications. Research on automatic document recognition started decades ago with the recognition of isolated digits and letters, and today, due to advancements in machine learning methods, efforts are being made to iden...

متن کامل

رفع اعوجاج هندسی متون به‌کمک اطلاعات هندسی خطوط متن

Document images produced by scanners or digital cameras usually have photometric and geometric distortions. If either of these effects distorts document, recognition of words from such a document image using OCR is subject to errors. In this paper we propose a novel approach to significantly remove geometric distortion from document images. In this method first we extract document lines from do...

متن کامل

Document Analysis And Classification Based On Passing Window

In this paper we present Document analysis and classification system to segment and classify contents of Arabic document images. This system includes preprocessing, document segmentation, feature extraction and document classification. A document image is enhanced in the preprocessing by removing noise, binarization, and detecting and correcting image skew. In document segmentation, an algorith...

متن کامل

Capability Analyzing of Solar Energy Based on Climatic Criteria Recognition in Iran’s Architectural Design by the Use of Fuzzy Analytical Hierarchy Process Method (FAHP)

Developing a comprehensive document based on the utmost use of renewable energy efficiency in the architecture design is the first step in national level to follow the goals of sustainable architecture and this is not possible without having a deep trend of the climatic compartment. The modeling of comprehensive energy plans in the architecture without having a quantitative approach is incomple...

متن کامل

Recognition of Sequence of Print and Ink Strokes: Investigation the Effect of Handwriting Pressure, Hue of Ink, Printer and Paper Type

By introducing of digital techniques, forensic document examiners has been encouraged to work with better accuracy in non-destructive ways. The aim of this study was to present a non-destructive, accessible, economic (affordable), user friendly, portable, useful and easy technique for specifying the order of crossing lines of ink stroke and printed text. The intersections of LaserJet and In...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008